Your browser doesn't support javascript.
Mostrar: 20 | 50 | 100
Resultados 1 - 4 de 4
Filtrar
Añadir filtros

Tipo del documento
Intervalo de año
1.
Comput Stat ; 38(2): 647-674, 2023.
Artículo en Inglés | MEDLINE | ID: covidwho-2327032

RESUMEN

Topic models are a useful and popular method to find latent topics of documents. However, the short and sparse texts in social media micro-blogs such as Twitter are challenging for the most commonly used Latent Dirichlet Allocation (LDA) topic model. We compare the performance of the standard LDA topic model with the Gibbs Sampler Dirichlet Multinomial Model (GSDMM) and the Gamma Poisson Mixture Model (GPM), which are specifically designed for sparse data. To compare the performance of the three models, we propose the simulation of pseudo-documents as a novel evaluation method. In a case study with short and sparse text, the models are evaluated on tweets filtered by keywords relating to the Covid-19 pandemic. We find that standard coherence scores that are often used for the evaluation of topic models perform poorly as an evaluation metric. The results of our simulation-based approach suggest that the GSDMM and GPM topic models may generate better topics than the standard LDA model.

2.
Computational statistics ; : 1-28, 2022.
Artículo en Inglés | EuropePMC | ID: covidwho-2286519

RESUMEN

Topic models are a useful and popular method to find latent topics of documents. However, the short and sparse texts in social media micro-blogs such as Twitter are challenging for the most commonly used Latent Dirichlet Allocation (LDA) topic model. We compare the performance of the standard LDA topic model with the Gibbs Sampler Dirichlet Multinomial Model (GSDMM) and the Gamma Poisson Mixture Model (GPM), which are specifically designed for sparse data. To compare the performance of the three models, we propose the simulation of pseudo-documents as a novel evaluation method. In a case study with short and sparse text, the models are evaluated on tweets filtered by keywords relating to the Covid-19 pandemic. We find that standard coherence scores that are often used for the evaluation of topic models perform poorly as an evaluation metric. The results of our simulation-based approach suggest that the GSDMM and GPM topic models may generate better topics than the standard LDA model.

3.
J R Stat Soc Ser A Stat Soc ; 2022 Jul 18.
Artículo en Inglés | MEDLINE | ID: covidwho-1937990

RESUMEN

A rapid response to global infectious disease outbreaks is crucial to protect public health. Ex ante information on the spatial probability distribution of early infections can guide governments to better target protection efforts. We propose a two-stage statistical approach to spatially map the ex ante importation risk of COVID-19 and its uncertainty across Indonesia based on a minimal set of routinely available input data related to the Indonesian flight network, traffic and population data, and geographical information. In a first step, we use a generalised additive model to predict the ex ante COVID-19 risk for 78 domestic Indonesian airports based on data from a global model on the disease spread and covariates associated with Indonesian airport network flight data prior to the global COVID-19 outbreak. In a second step, we apply a Bayesian geostatistical model to propagate the estimated COVID-19 risk from the airports to all of Indonesia using freely available spatial covariates including traffic density, population and two spatial distance metrics. The results of our analysis are illustrated using exceedance probability surface maps, which provide policy-relevant information accounting for the uncertainty of the estimates on the location of areas at risk and those that might require further data collection.

4.
J R Stat Soc Ser A Stat Soc ; 185(1): 202-218, 2022 Jan.
Artículo en Inglés | MEDLINE | ID: covidwho-1575364

RESUMEN

As the COVID-19 pandemic continues to threaten various regions around the world, obtaining accurate and reliable COVID-19 data is crucial for governments and local communities aiming at rigorously assessing the extent and magnitude of the virus spread and deploying efficient interventions. Using data reported between January and February 2020 in China, we compared counts of COVID-19 from near-real-time spatially disaggregated data (city level) with fine-spatial scale predictions from a Bayesian downscaling regression model applied to a reference province-level data set. The results highlight discrepancies in the counts of coronavirus-infected cases at the district level and identify districts that may require further investigation.

SELECCIÓN DE REFERENCIAS
DETALLE DE LA BÚSQUEDA